Analysis of High-Throughput Sequencing and Annotation Strategies for Phage Genomes
نویسندگان
چکیده
BACKGROUND Bacterial viruses (phages) play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles), and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL) or of a whole genome shotgun library (WGSL), or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.
منابع مشابه
Finding the Needle in the Haystack: Computational Strategies for Discovering Regulatory Sequences in Genomes
Annotating the noncoding portion of the human genome and identifying functional regulatory elements embedded in its sequence creates a continuing challenge. Historically, the functional characterization of regulatory elements has been slow, labor-intensive and inadequate to keep up with the demands of whole–genome analysis. Recently, there has been an explosion of computational techniques and t...
متن کاملAssessing Illumina technology for the high-throughput sequencing of bacteriophage genomes
Bacteriophages are the most abundant biological entities on the planet, playing crucial roles in the shaping of bacterial populations. Phages have smaller genomes than their bacterial hosts, yet there are currently fewer fully sequenced phage than bacterial genomes. We assessed the suitability of Illumina technology for high-throughput sequencing and subsequent assembly of phage genomes. In sil...
متن کاملThe automatic annotation of bacterial genomes
With the development of ultra-high-throughput technologies, the cost of sequencing bacterial genomes has been vastly reduced. As more genomes are sequenced, less time can be spent manually annotating those genomes, resulting in an increased reliance on automatic annotation pipelines. However, automatic pipelines can produce inaccurate genome annotation and their results often require manual cur...
متن کاملSequence analysis of genes and genomes.
A major step towards understanding of the genetic basis of an organism is the complete sequence determination of all genes in its genome. The development of powerful techniques for DNA sequencing has enabled sequencing of large amounts of gene fragments and even complete genomes. Important new techniques for physical mapping, DNA sequencing and sequence analysis have been developed. To increase...
متن کاملMToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing
MOTIVATION The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a hi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2010